Mutual Information Clustering for Efficient Mining of Fuzzy Association Rules with Application to Gene Expression Data Analysis
نویسندگان
چکیده
The extraction of fuzzy association rules for the description of dependencies and interactions from large data sets as those arising in gene expression data analysis applications perplexes very difficult combinatorial problems that depend heavily on the size of these sets. The paper describes a two stage approach to the problem that obtains computationally manageable solutions. The first stage aims to cluster transactions that more probably are associated. Thereafter, the second stage, the fuzzy association rule extraction follows, confronting a significantly reduced problem. The clustering phase is accomplished by means of a Kernel Supervised Dynamic Grid SelfOrganized Map (KSDG-SOM).The mutual information metric controls the development of the KSDG-SOM clusters. This metric allows the formation of data clusters that maximize the mutual information for transactions of the same cluster and to minimize it between different clusters. In addition the KSDG-SOM is capable of incorporating a priori information concerning the transaction's items that can focus the model to cluster together even more probably associated items. After this initial data clustering we concetrate on whether the pattern of a transaction can be associated with characteristics of the patterns of other transactions of the same node. Therefore, the fuzzy association rules are extracted locally on a per cluster basis. The paper focuses on the application of the techniques for mining the gene expression data. However, the presented techniques can easily be adapted and can be fruitful for intelligent exploration of any other data set as well. keywords: Fuzzy Association Rules, Mutual Information, Clustering, Self-Organized Maps, Entropy, Genome Data Mining, Gene Expression Analysis
منابع مشابه
Developing a Course Recommender by Combining Clustering and Fuzzy Association Rules
Each semester, students go through the process of selecting appropriate courses. It is difficult to find information about each course and ultimately make decisions. The objective of this paper is to design a course recommender model which takes student characteristics into account to recommend appropriate courses. The model uses clustering to identify students with similar interests and skills...
متن کاملModification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis
Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...
متن کاملApplying a decision support system for accident analysis by using data mining approach: A case study on one of the Iranian manufactures
Uncertain and stochastic states have been always taken into consideration in the fields of risk management and accident, like other fields of industrial engineering, and have made decision making difficult and complicated for managers in corrective action selection and control measure approach. In this research, huge data sets of the accidents of a manufacturing and industrial unit have been st...
متن کاملA new approach based on data envelopment analysis with double frontiers for ranking the discovered rules from data mining
Data envelopment analysis (DEA) is a relatively new data oriented approach to evaluate performance of a set of peer entities called decision-making units (DMUs) that convert multiple inputs into multiple outputs. Within a relative limited period, DEA has been converted into a strong quantitative and analytical tool to measure and evaluate performance. In an article written by Toloo et al. (2009...
متن کاملRetaining Customers Using Clustering and Association Rules in Insurance Industry: A Case Study
This study clusters customers and finds the characteristics of different groups in a life insurance company in order to find a way for prediction of customer behavior based on payment. The approach is to use clustering and association rules based on CRISP-DM methodology in data mining. The researcher could classify customers of each policy in three different clusters, using association rules. A...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- International Journal on Artificial Intelligence Tools
دوره 15 شماره
صفحات -
تاریخ انتشار 2006